Conversation
The shared OCI cache at data_dir/system/oci-cache grew without bound
because neither the pull path nor the registry push path had a cleanup
hook. The image retention controller only touches data_dir/images, so
manifests and layer blobs that were no longer referenced lived forever.
This change adds a new lib/ocicachegc package that walks index.json and
every referenced manifest to build the live set of blob digests, then
deletes any file under blobs/sha256/ that is not in that set. Blobs
whose mtime is within the configured min_blob_age are kept; this grace
period is what lets the sweep run safely alongside concurrent pulls
(which write layer blobs before updating index.json) and registry
pushes.
Disabled by default. Enable via:
images:
oci_cache_gc:
enabled: true
interval: 1h
min_blob_age: 1h
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR adds a new garbage collection package for OCI cache management, but does not modify API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal), which are the specific areas the filter requires for monitoring. To monitor this PR anyway, reply with |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8c46b4d. Configure here.
Previously walkDescriptor added subject.Digest to the live set as a leaf without descending, so the subject manifest's own config and layers could be swept. Recurse like manifests[] so the full referrer chain stays marked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Summary
The shared OCI cache at
data_dir/system/oci-cachecurrently growswithout bound — neither the pull path (
layout.AppendImage) nor theregistry push path (
BlobStore.Put) ever remove blobs, and the imageretention controller only touches
data_dir/images. Over time thisaccumulates dead manifest, config, and layer blobs that are no longer
reachable from
index.json.This change adds a new
lib/ocicachegcpackage that walksindex.jsonand every referenced manifest to build the set of live blob digests,
then deletes any file under
blobs/sha256/that isn't in that set.Blobs whose mtime is within the configured
min_blob_ageare alwayskept; that grace period is what lets the sweep run safely alongside
concurrent pulls (which write layer blobs before updating
index.json)and registry pushes (which rename
<hex>.tmp→<hex>before themanifest trigger).
Config
Disabled by default. Opt-in via:
How it decides what's live
index.json.If the blob is a manifest or manifest index, recurse into its
config,layers,manifests, andsubjectreferences.Unparseable or missing referenced blobs are treated as opaque leaves —
they remain "live" but we don't descend into them. The collector never
deletes a blob it cannot prove is dead.
.tmpfiles and anything whose name is not a 64-hex-char blob digestare ignored by the sweep entirely.
Metrics
hypeman_oci_cache_gc_sweeps_total(counter, status)hypeman_oci_cache_gc_sweep_duration_seconds(histogram)hypeman_oci_cache_gc_deleted_blobs_total(counter)hypeman_oci_cache_gc_deleted_bytes_total(counter)Test plan
go test ./lib/ocicachegc/...passes (live set kept, orphans deleted, grace period honored, tmp/non-blob filenames ignored, manifest index traversal)go test ./cmd/api/config/...passes (new duration validators)go test ./lib/imageretention/...passes (unchanged)go build ./cmd/api/...cleango vet ./...cleanManual validation
deft-kernel-dev, ran the realhypemanbinary from a fresh scratch clone withimages.oci_cache_gc.enabled: trueand an isolated tempdata_dir.data_dir/system/oci-cachewith one live manifest/config/layer set, one old orphan blob, and one recent orphan blob.oci cache gc enabled,oci cache gc started, anddeleted unreferenced oci blobfor the old orphan digest.go mod download,make oapi-generate,make build,go run ./cmd/test-prewarm,go test -count=1 -tags containers_image_openpgp -timeout=20m ./...(pass,300s).Note
Medium Risk
Adds a background garbage-collector that deletes unreferenced blobs from
data_dir/system/oci-cache, which is disk-destructive if the live-set computation or age gating is wrong. Disabled by default, but enabling it impacts runtime behavior and storage integrity.Overview
Adds an opt-in mark-and-sweep garbage collector (
lib/ocicachegc) to reclaim disk space in the shared OCI cache (data_dir/system/oci-cache) by deleting blob files underblobs/sha256/that are not reachable fromindex.json, with amin_blob_agegrace period to avoid races with concurrent pulls/pushes.Wires the collector into the API server startup as a background goroutine when
images.oci_cache_gc.enabledis set, and introduces new config defaults + validation forimages.oci_cache_gc.intervalandimages.oci_cache_gc.min_blob_age(with example config/docs updates and new OTel metrics for sweep status, duration, and bytes/blobs deleted).Reviewed by Cursor Bugbot for commit a4a40ff. Bugbot is set up for automated code reviews on this repo. Configure here.